AITopics | mean performance

Collaborating Authors

mean performance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RoSHAP: A Distributional Framework and Robust Metric for Stable Feature Attribution

Xiang, Lanxin, Shi, Liang, Ye, Youhui, Jiang, Boyu, Zhou, Dawei, Guo, Feng

arXiv.org Machine LearningMay-15-2026

Feature attribution analysis is critical for interpreting machine learning models and supporting reliable data-driven decisions. However, feature attribution measures often exhibit stochastic variation: different train--test splits, random seeds, or model-fitting procedures can produce substantially different attribution values and feature rankings. This paper proposes a framework for incorporating stochastic nature of feature attribution and a robust attribution metric, RoSHAP, for stable feature ranking based on the SHAP metric. The proposed framework models the distribution of feature attribution scores and estimates it through bootstrap resampling and kernel density estimation. We show that, under mild regularity conditions, the aggregated feature attribution score is asymptotically Gaussian, which greatly reduces the computational cost of distribution estimation. The RoSHAP summarizes the distribution of SHAP into a robust feature-ranking criterion that simultaneously rewards features that are active, strong, and stable. Through simulations and real-data experiments, the proposed framework and RoSHAP outperform standard single-run attribution measures in identifying signal features. In addition, models built using RoSHAP-selected features achieve predictive performance comparable to full-feature models while using substantially fewer predictors. The proposed RoSHAP approach improves the stability and interpretability of machine learning models, enabling reliable and consistent insights for analysis.

artificial intelligence, machine learning, selection performance comparison, (15 more...)

arXiv.org Machine Learning

2605.15154

Country: North America > United States (0.47)

Genre: Research Report > Experimental Study (0.69)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)

Add feedback

Outcome-based Exploration for LLM Reasoning

Song, Yuda, Kempe, Julia, Munos, Remi

arXiv.org Artificial IntelligenceSep-9-2025

Reinforcement learning (RL) has emerged as a powerful method for improving the reasoning abilities of large language models (LLMs). Outcome-based RL, which rewards policies solely for the correctness of the final answer, yields substantial accuracy gains but also induces a systematic loss in generation diversity. This collapse undermines real-world performance, where diversity is critical for test-time scaling. We analyze this phenomenon by viewing RL post-training as a sampling process and show that, strikingly, RL can reduce effective diversity even on the training set relative to the base model. Our study highlights two central findings: (i) a transfer of diversity degradation, where reduced diversity on solved problems propagates to unsolved ones, and (ii) the tractability of the outcome space, since reasoning tasks admit only a limited set of distinct answers. Motivated by these insights, we propose outcome-based exploration, which assigns exploration bonuses according to final outcomes. We introduce two complementary algorithms: historical exploration, which encourages rarely observed answers via UCB-style bonuses, and batch exploration, which penalizes within-batch repetition to promote test-time diversity. Experiments on standard competition math with Llama and Qwen models demonstrate that both methods improve accuracy while mitigating diversity collapse. On the theoretical side, we formalize the benefit of outcome-based exploration through a new model of outcome-based bandits. Together, these contributions chart a practical path toward RL methods that enhance reasoning without sacrificing the diversity essential for scalable deployment.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.06941

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

7eb7eabbe9bd03c2fc99881d04da9cbd-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 10:53:27 GMT

intervention, ispn, mean performance, (14 more...)

Neural Information Processing Systems

Country: Asia (0.06)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model

Patwardhan, Sushrut, Ramachandra, Raghavendra, Venkatesh, Sushma

arXiv.org Artificial IntelligenceAug-15-2025

Morphing attack detection has become an essential component of face recognition systems for ensuring a reliable verification scenario. In this paper, we present a multimodal learning approach that can provide a textual description of morphing attack detection. We first show that zero-shot evaluation of the proposed framework using Contrastive Language-Image Pretraining (CLIP) can yield not only generalizable morphing attack detection, but also predict the most relevant text snippet. We present an extensive analysis of ten different textual prompts that include both short and long textual prompts. These prompts are engineered by considering the human understandable textual snippet. Extensive experiments were performed on a face morphing dataset that was developed using a publicly available face biometric dataset. We present an evaluation of SOT A pre-trained neural networks together with the proposed framework in the zero-shot evaluation of five different morphing generation techniques that are captured in three different mediums.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-93694-4_14

2508.1011

Country: Europe (0.14)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Add feedback

Has the Creativity of Large-Language Models peaked? An analysis of inter- and intra-LLM variability

Haase, Jennifer, Hanel, Paul H. P., Pokutta, Sebastian

arXiv.org Artificial IntelligenceApr-18-2025

Following the widespread adoption of ChatGPT in early 2023, numerous studies reported that large language models (LLMs) can match or even surpass human performance in creative tasks. However, it remains unclear whether LLMs have become more creative over time, and how consistent their creative output is. In this study, we evaluated 14 widely used LLMs -- including GPT-4, Claude, Llama, Grok, Mistral, and DeepSeek -- across two validated creativity assessments: the Divergent Association Task (DAT) and the Alternative Uses Task (AUT). Contrary to expectations, we found no evidence of increased creative performance over the past 18-24 months, with GPT-4 performing worse than in previous studies. For the more widely used AUT, all models performed on average better than the average human, with GPT-4o and o3-mini performing best. However, only 0.28% of LLM-generated responses reached the top 10% of human creativity benchmarks. Beyond inter-model differences, we document substantial intra-model variability: the same LLM, given the same prompt, can produce outputs ranging from below-average to original. This variability has important implications for both creativity research and practical applications. Ignoring such variability risks misjudging the creative potential of LLMs, either inflating or underestimating their capabilities. The choice of prompts affected LLMs differently. Our findings underscore the need for more nuanced evaluation frameworks and highlight the importance of model selection, prompt design, and repeated assessment when using Generative AI (GenAI) tools in creative contexts.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.1232

Country:

North America > United States (1.00)
Europe (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.48)

Industry:

Education (1.00)
Leisure & Entertainment (0.93)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.88)

Add feedback

EvoSampling: A Granular Ball-based Evolutionary Hybrid Sampling with Knowledge Transfer for Imbalanced Learning

Pei, Wenbin, Dai, Ruohao, Xue, Bing, Zhang, Mengjie, Zhang, Qiang, Cheung, Yiu-Ming, Xia, Shuyin

arXiv.org Artificial IntelligenceDec-12-2024

Class imbalance would lead to biased classifiers that favor the majority class and disadvantage the minority class. Unfortunately, from a practical perspective, the minority class is of importance in many real-life applications. Hybrid sampling methods address this by oversampling the minority class to increase the number of its instances, followed by undersampling to remove low-quality instances. However, most existing sampling methods face difficulties in generating diverse high-quality instances and often fail to remove noise or low-quality instances on a larger scale effectively. This paper therefore proposes an evolutionary multi-granularity hybrid sampling method, called EvoSampling. During the oversampling process, genetic programming (GP) is used with multi-task learning to effectively and efficiently generate diverse high-quality instances. During the undersampling process, we develop a granular ball-based undersampling method that removes noise in a multi-granular fashion, thereby enhancing data quality. Experiments on 20 imbalanced datasets demonstrate that EvoSampling effectively enhances the performance of various classification algorithms by providing better datasets than existing sampling methods. Besides, ablation studies further indicate that allowing knowledge transfer accelerates the GP's evolutionary learning process.

dataset, evosampling, ieee transaction, (15 more...)

arXiv.org Artificial Intelligence

2412.10461

Country:

Asia > China > Liaoning Province > Dalian (0.05)
Asia > China > Chongqing Province > Chongqing (0.05)
Asia > China > Hong Kong (0.04)
(4 more...)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Optimizing Performance on Trinity Utilizing Machine Learning, Proxy Applications and Scheduling Priorities

Romero, Phil

arXiv.org Artificial IntelligenceMar-15-2024

The sheer number of nodes continues to increase in todays supercomputers, the first half of Trinity alone contains more than 9400 compute nodes. Since the speed of todays clusters are limited by the slowest nodes, it more important than ever to identify slow nodes, improve their performance if it can be done, and assure minimal usage of slower nodes during performance critical runs. This is an ongoing maintenance task that occurs on a regular basis and, therefore, it is important to minimize the impact upon its users by assessing and addressing slow performing nodes and mitigating their consequences while minimizing down time. These issues can be solved, in large part, through a systematic application of fast running hardware assessment tests, the application of Machine Learning, and making use of performance data to increase efficiency of large clusters. Proxy applications utilizing both MPI and OpenMP were developed to produce data as a substitute for long runtime applications to evaluate node performance. Machine learning is applied to identify underperforming nodes, and policies are being discussed to both minimize the impact of underperforming nodes and increase the efficiency of the system. In this paper, I will describe the process used to produce quickly performing proxy tests, consider various methods to isolate the outliers, and produce ordered lists for use in scheduling to accomplish this task.

application, node, outlier, (15 more...)

arXiv.org Artificial Intelligence

2404.10617

Country:

North America > United States > New Mexico > Los Alamos County > Los Alamos (0.05)
North America > United States > Tennessee > Knox County > Knoxville (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.64)

Industry:

Energy (0.69)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Effects of diversity incentives on sample diversity and downstream model performance in LLM-based text augmentation

Cegin, Jan, Pecher, Branislav, Simko, Jakub, Srba, Ivan, Bielikova, Maria, Brusilovsky, Peter

arXiv.org Artificial IntelligenceJan-12-2024

The latest generative large language models (LLMs) have found their application in data augmentation tasks, where small numbers of text samples are LLM-paraphrased and then used to fine-tune the model. However, more research is needed to assess how different prompts, seed data selection strategies, filtering methods, or model settings affect the quality of paraphrased data (and downstream models). In this study, we investigate three text diversity incentive methods well established in crowdsourcing: taboo words, hints by previous outlier solutions, and chaining on previous outlier solutions. Using these incentive methods as part of instructions to LLMs augmenting text datasets, we measure their effects on generated texts' lexical diversity and downstream model performance. We compare the effects over 5 different LLMs and 6 datasets. We show that diversity is most increased by taboo words, while downstream model performance is highest when previously created paraphrases are used as hints.

dataset, diversity, diversity incentive method, (14 more...)

arXiv.org Artificial Intelligence

2401.06643

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Singapore (0.04)
(8 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Probabilistic Offline Policy Ranking with Approximate Bayesian Computation

Da, Longchao, Jenkins, Porter, Schwantes, Trevor, Dotson, Jeffrey, Wei, Hua

arXiv.org Artificial IntelligenceDec-17-2023

In practice, it is essential to compare and rank candidate policies offline before real-world deployment for safety and reliability. Prior work seeks to solve this offline policy ranking (OPR) problem through value-based methods, such as Off-policy evaluation (OPE). However, they fail to analyze special cases performance (e.g., worst or best cases), due to the lack of holistic characterization of policies performance. It is even more difficult to estimate precise policy values when the reward is not fully accessible under sparse settings. In this paper, we present Probabilistic Offline Policy Ranking (POPR), a framework to address OPR problems by leveraging expert data to characterize the probability of a candidate policy behaving like experts, and approximating its entire performance posterior distribution to help with ranking. POPR does not rely on value estimation, and the derived performance posterior can be used to distinguish candidates in worst, best, and average-cases. To estimate the posterior, we propose POPR-EABC, an Energy-based Approximate Bayesian Computation (ABC) method conducting likelihood-free inference. POPR-EABC reduces the heuristic nature of ABC by a smooth energy function, and improves the sampling efficiency by a pseudo-likelihood. We empirically demonstrate that POPR-EABC is adequate for evaluating policies in both discrete and continuous action spaces across various experiment environments, and facilitates probabilistic comparisons of candidate policies before deployment.

candidate policy, evaluation, popr-eabc, (14 more...)

arXiv.org Artificial Intelligence

2312.11551

Country:

North America > United States > Arizona (0.04)
Europe > Italy > Apulia > Bari (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.60)

Add feedback

ExPT: Synthetic Pretraining for Few-Shot Experimental Design

Nguyen, Tung, Agrawal, Sudhanshu, Grover, Aditya

arXiv.org Artificial IntelligenceOct-30-2023

Experimental design is a fundamental problem in many science and engineering fields. In this problem, sample efficiency is crucial due to the time, money, and safety costs of real-world design evaluations. Existing approaches either rely on active data collection or access to large, labeled datasets of past experiments, making them impractical in many real-world scenarios. In this work, we address the more challenging yet realistic setting of few-shot experimental design, where only a few labeled data points of input designs and their corresponding values are available. We approach this problem as a conditional generation task, where a model conditions on a few labeled examples and the desired output to generate an optimal input design. To this end, we introduce Experiment Pretrained Transformers (ExPT), a foundation model for few-shot experimental design that employs a novel combination of synthetic pretraining with in-context learning. In ExPT, we only assume knowledge of a finite collection of unlabelled data points from the input domain and pretrain a transformer neural network to optimize diverse synthetic functions defined over this domain. Unsupervised pretraining allows ExPT to adapt to any design task at test time in an in-context fashion by conditioning on a few labeled data points from the target task and generating the candidate optima. We evaluate ExPT on few-shot experimental design in challenging domains and demonstrate its superior generality and performance compared to existing methods. The source code is available at https://github.com/tung-nd/ExPT.git.

expt, expt 0, optimization, (14 more...)

arXiv.org Artificial Intelligence

2310.19961

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback